An Analysis of NFL Offensive Stats
Introduction
Since I was a little kid, I have been a huge fan of the Kansas City Chiefs football team. I’ve watched the team go through several different players, and have many up and down seasons. I have also followed the nfl for several years, so I chose an nfl centered dataset.
When looking at the player and team datasets, my main goal is analyzing nfl stats at both a player level. I have seen many different types of players, so I am interested what the stats looks like of each position, and how the stats have changed over the last 5 years. I am interested in analyzing how player stats have changed over the last few years, and how stats contribute to a team’s performance.
Data Overview & Quality
I am using a data set from kaggle.com that shown the offensive stats for every offensive player in every game in the NFL from 2019-2022. The data set has 69 columns that cover a variety of stats for different positions, and 19,973 observations. Each observation correlates with one players stats during one NFL game within those 4 years. There are both categorical and numerical variables, although the data set has several more numerical variables.
There was missingness issues for only one variable, which was “Vegas Favorite”. This is the variable that says if the player was considered a favorite in the stats for betting, so it’s not entirely surprising that some of the variables are missing. I don’t believe the issues in this one variable will impact the analysis.
There was a little bit of cleaning that had to go into this data set. There were a few defensive players included in the datatset, so I removed those. I also removed a few columns that were not necessary to my data exploration, such as “player_id”, “team_abbr”, and a few stats that did not relate to offensive player performance. Lastly, I also mutated the game id column so it just showed the year of the game, rather than the long number id.
Explorations
Recieving Analysis
First, I want to look at receiving.
Frequency of Recieving Yards
To begin, I am curious about what the distribution looks like of receiving yards. There are two positions that receive: wide receivers and tight ends. I want to look at the distribution for the overall stats.
Figure 1 shows that majority of recieving yards lie in the 0-50 range. However, the graph has a right skew, and there are outliers which are players with above 150+ yards in a game.
Next, I want to look at the distribution for overalls stats by position.
Figure 2 reveals that Tight Ends have lower overall receiving stats than wide receivers. This is not surprising, since tight ends have two jobs: blocking and receiving. Not all tight ends catch, but all wide receivers catch. This difference in position nature explains the difference in the range of receiving yards between the two positions.
Wide Reciever Receiving Yards Across Years
Next, I want to explore how the receiving yards stats have changed across years. Since tight ends and wide receivers are fairly different positions, I am going to explore the position separately. First, I am going to look at wide receivers.
| year | mean_rec_yds |
|---|---|
| 2019 | 33.55330 |
| 2020 | 34.26984 |
| 2021 | 31.89035 |
| 2022 | 32.30046 |
The receiving yards by year of wide receivers shows a unique trend. The average yards went up in 2020, but then when down in 2021 and 2022. However, the average has not changed significantly, only going down a total of an average of 2 yards. This shows that the emphasis on throwing in the NFL has remained fairly consistent, and that the talent in the NFL has also stayed consistent.
The range in Figure 3 year by year is also fairly consistent. There seem to be many outliers each year, and there is one extreme outliers in 2020 and 2022.
Top Wide Receivers
Now, I am interested in looking at which players these 2 outliers are for.
| player | position | year | team | rec_yds |
|---|---|---|---|---|
| Tyreek Hill | WR | 2020 | KAN | 269 |
| Tyler Lockett | WR | 2020 | SEA | 200 |
| player | position | year | team | rec_yds |
|---|---|---|---|---|
| Ja'Marr Chase | WR | 2022 | CIN | 266 |
| Gabriel Davis | WR | 2022 | BUF | 201 |
After looking into it more, I found that the outlier in 2020 was Tyreek Hill and the outlier in 2022 was Ja’Marr Chase.
After exploring further, I learned that Tyreek Hill got 269 receiving yards while he played for the Kansas City Chiefs in a game against the Tampa Bay Bucaneers. I personally remember watching this game, and I recalled it as soon as I saw the outlier, which is why I wanted to explore further. Ja’Marr Chase got 266 receiving yards for the Cinncinati Bengals in a game against the Jacksonville Jaguars.
Both of these games set records, with Hill holding the 14th most recieving yards in a single game of all time and Chase holding the 16th most. Taking these stats into consideration, it makes sense that the outliers look so significant in the distribution.
Tight End Receiving Yards Across Years
Now that I have explored the recieving yards of wide receivers from 2019 to 2022, I am interested in exploring the change in receiving yards of tight ends across years.
| year | mean_rec_yds |
|---|---|
| 2019 | 18.80272 |
| 2020 | 17.57289 |
| 2021 | 17.35009 |
| 2022 | 17.09881 |
The average receiving stats of tight ends each year is more consistent than wide receivers, remaining around 17 each year, although it was higher in 2019.
When looking at Figure 4, there are similar ranges across years. There also seems to be several outliers each year. However, 2020 and 2021 have higher outliers than either of the other years.
Top Tight Ends
Similar to Wider Receivers, I am curious to see which tight ends the outliers are.
| player | position | year | team | rec_yds |
|---|---|---|---|---|
| Darren Waller | TE | 2020 | LVR | 200 |
| George Kittle | TE | 2020 | SFO | 183 |
| Travis Kelce | TE | 2020 | KAN | 159 |
| Darren Waller | TE | 2020 | LVR | 150 |
| Travis Kelce | TE | 2020 | KAN | 136 |
| player | position | year | team | rec_yds |
|---|---|---|---|---|
| Travis Kelce | TE | 2021 | KAN | 191 |
| George Kittle | TE | 2021 | SFO | 181 |
| Kyle Pitts | TE | 2021 | ATL | 163 |
| George Kittle | TE | 2021 | SFO | 151 |
| David Njoku | TE | 2021 | CLE | 149 |
Unlike Wide Receivers, there isn’t any extreme outliers, but there are several that are significantly above average. The average receiving yards per tight end is 17-18, yet there are receivers with above 150 yards. When looking at the top 5 for the two outlier years, there are several players with high numbers, such as Travis Kelce and George Kittle.
| player | avg_rec_yds |
|---|---|
| Travis Kelce | 83.50909 |
| Darren Waller | 70.04545 |
| George Kittle | 66.82927 |
| Mark Andrews | 61.67347 |
| Kyle Pitts | 60.35294 |
When looking at the top tight ends with the best overall receiving yards, the data is consistent with the outliers in 2020 and 2021. Travis Kelce, Darren Waller, George Kittle, and Kyle Pitts have the highest average receiving yards across all games, so it makes sense those 4 make up many of the outliers.
It also makes sense that tight ends have a wider range of receiving yards that wide receivers since the position has two functions, as mentioned earlier. Since not all tight ends receive, it creates a high disparity between the average receiving yards among all tight ends and the average recieving yards among the top tight ends.
Tight Ends vs. Wide Receivers
Now that we have explored the receiving yards across years among tight ends and wide receivers individually, I am interested in comparing them during each year.
Figure 5 and Figure 6 make it clear that wide receivers have a much higher amount of receiving yards than tight ends, both total and average. Neither of these graphs is surprising.
First, there are 3 wide recievers on a team, compared to 1 tight end on a team. This explains why the total receiving yards is higher. Second, since the positions functions differently, it is understandable why the average receiving yards is higher for wide receivers.
Recieving Yards vs Other Recieving Variables
When looking at the variables in Figure 7, none of them seem to have an extremely high correlation with one another. The only variables that look to have a somewhat high relation to the other is receiving yards and receiving long.
Recieving Yards vs Other Recieving Variables
nfl_rec <- nfloffensiveplayers_new |>
filter(position == "WR" | position == "TE") |>
select(rec_yds, rec_long) |>
cor()
kable(nfl_rec)| rec_yds | rec_long | |
|---|---|---|
| rec_yds | 1.0000000 | 0.8473774 |
| rec_long | 0.8473774 | 1.0000000 |
Figure 6 shows a heavy relation between the two variables, which is confirmed by the table that shows an 84% correlation between receiving yards and receiving long. This correlation makes sense, since the “receiving long” variable means that the receiving was running a long route when they caught the call. The longer the route, the more yards they in turn get, causing more receiving yards.
Passing Analysis
Next, I want to examine passing.
Passing Yards Across Years
First, I am going to analyze the passing yards across years.
| year | mean_pass_yds |
|---|---|
| 2019 | 212.3322 |
| 2020 | 204.7006 |
| 2021 | 199.5871 |
| 2022 | 202.2571 |
The average passing yards in the nfl is interesting. You have to pass a ball to recieve it, so I am surprised that there is a higher range of differences year by year in passing than in recieving. There was a downward trend in average recieving yards from 2019 to 2021, but then it went back up. I think the disaparity between throwing and recieving could be explained by the fact that there is only one quarterback on an nfl team, compared to multiple recievers.
Figure 9 shows that the range of passing yards is pretty consistent, similar to the averages, althought there is a small downward trend. There are almost now outliers, except for 1 extreme outlier in 2019 and 2 extreme outliers in 2021.
Top Quarterbacks
Now, I am interested in exploring what the who the 3 quarterback outliers are.
| player | position | year | team | pass_yds |
|---|---|---|---|---|
| Jared Goff | QB | 2019 | LAR | 517 |
| Dak Prescott | QB | 2019 | DAL | 463 |
| Matt Schaub | QB | 2019 | ATL | 460 |
| Jameis Winston | QB | 2019 | TAM | 458 |
| Jameis Winston | QB | 2019 | TAM | 456 |
| player | position | year | team | pass_yds |
|---|---|---|---|---|
| Joe Burrow | QB | 2021 | CIN | 525 |
| Ben Roethlisberger | QB | 2021 | PIT | 501 |
| Dak Prescott | QB | 2021 | DAL | 445 |
| Lamar Jackson | QB | 2021 | BAL | 442 |
| Derek Carr | QB | 2021 | LVR | 435 |
The outlier in 2019 was Jared Goff when he played for the Los Angeles Rams. The outliers in 2021 were Joe Burrow when he played for the Cinncinati Bengals and Ben Roethlisberger when he played for the Pittsburg Steelers. These are outliers since they are all 50+ yards above the person below them.
Similar to the Wide Reciever stats, these games all set records. Burrows game was the 4th most passing yards of all time in an nfl game, Roethlisberger’s was the 5th most, and Goff’s was the 10th most. These records help explain why the outliers seem so significant.
Correlation Between Passing Yards and Other Passing Variables
Now that we have analyzed passing yards, I am interested in analyzing the relationship between passing yards and the other variables related to passing: pass completions, pass attempts, pass interceptions, passes sacked, and pass rating.
Figure 10 shows that the three variables with the highest correlations to each other are pass completions, pass yards, and pass attempts. I am interested in further exploring relationship with those 3 variables.
Pass Completions vs. Pass Yards vs. Pass Attempts
First, I want to further explore the correlation between passing yards, pass completions, and pass attempts for quarterbacks.
Figure 11 explores the relationship between all 3 at the same time. The variables all have high correlations with each other and demonstrates a positive correlation in the graphs.
Althought they are all similar, pass completions and pass attempts have a slightly higher correlation than the other variables. Pass yards and pass completions have the second highest correlation, and pass yards and pass attempts have the lowest correlation.
Rushing Analysis
Finally, I am going to explore rushing in this dataset.
Distribution of Rushing Yards
There are three different positions that can rush the ball: running backs, and full backs. To start, I want to examine the distribution or rushing yards of both positions.
Figure 12 shows the distribution of rushing yards per game and Figure 13 shows the rushing yards per player. Both Figure 12 and Figure 13 have similar overall distributions. They are both right skew and unimodel. However, the frequency of the average rushing yards is lower than total yards, althought that is to be expected.
I am personally not surprised by the right skew. Rushing is generally not meant to gain a lot of yards, so it makes send that both the majority of rushing yards per game per and the average rushing yards per player is in the 0-25 range. However, some plays end up gaining a large number of rushing yards, which explains the right skew and the outliers on both graphs. There is one outlier in average yards, which will be explored below in the “Top Running Backs” section.p;
Distribution of Rushing Yards by Position
Next, I want to compare the distribution of rushing yards among the 2 positions.
Figure 14 and Figure 15 produce consistent results. In both graphs, running backs have significantly higher rushing yards than fullbacks. At first, these results surprised me since running backs and full backs are virtually the same position. However, after analyzing it more and looking at the dataset, I realized that teams have siginificantly more running backs than fullbacks and they use running backs more. This explains the difference in both total yards and average yards.
Analysis of Rushing Yards by Year
Now that we have analyzed the distribution of rushing yards, I want to examine how rushing yards have changed over the last 5 years. Since running backs and fullbacks are extremely similar positions, I am going to summarize both positions for this analysis.
| year | mean_rush_yds |
|---|---|
| 2019 | 29.41245 |
| 2020 | 29.64167 |
| 2021 | 29.61589 |
| 2022 | 29.08834 |
There is a extremely high consistency when it comes to rushing yards in the nfl. It remained around 33.7 average yards across the 4 years, never differentiating by more than a yard. The average yards are also lower than recieving and passing, which makes sense since rushing generally does not lead to as many yard gains as recieving does.
Figure 16 shows a similar range among all four years. However, there are many outliers in each year. 2021 has 2 outliers that are higher than any other year.
Top Running Backs
Similar to the other positions, I am interested in seeing which running backs performed high enough to be the outliers.
| player | position | year | team | rush_yds |
|---|---|---|---|---|
| Jonathan Taylor | RB | 2021 | IND | 253 |
| Derrick Henry | RB | 2021 | TEN | 250 |
| Dalvin Cook | RB | 2021 | MIN | 205 |
| Jonathan Taylor | RB | 2021 | IND | 185 |
| Derrick Henry | RB | 2021 | TEN | 182 |
Jonathan Taylor had 253 yards in a single game while playing for the Indianapolis Colts, and Derrick Henry had 250 yards while playing for the Tennesee Titans. This is over 200 above the average and over 50 above the other outliers, which explains why it stands out in the graph.
Unsurpringsly, these games also set records in the nfl. Taylor had the 9th most rushign yards in an individual game in nfl history, and Henry had the 13th. It is interesting to compare the stand-out outliers to nfl records, since it puts into perspective how extreme the outliers truly are.
| player | avg_rush_yds |
|---|---|
| Derrick Henry | 114.88636 |
| Jonathan Taylor | 92.66667 |
These result are consistent with the outliers, since Jonathon Taylor and Derrick Henry not only had the two highest performing games, but were also the overall top 2 highest performing running backs in the past 5 years.
Rush Yards vs. Other Rush Variables
Figure 17 shows that unlike the passing variables, the rushing variables don’t have as heavy of a correlation with each other. The only variables that seem to have somewhat of a heavy correlation is rushing yards and rushing attempts, and then rushing yards and rushing yards before contact.
Rush Attempts vs. Rush Yards
Now, I want to further explore the correlation between rushing yards and rush attempts for running backs.
| rush_att | rush_yds | |
|---|---|---|
| rush_att | 1.0000000 | 0.8748997 |
| rush_yds | 0.8748997 | 1.0000000 |
Figure 18 showing a high correlation between rushing yards and rushing attempts, which is an 87.5% correlation according to the table. This is consistent with the correlation shown by the heat map. It also makes sense that these two variables are related. The more attempts someone has at rushing, the more yards they will get. That said, it also makes sense that it is not extremely close to 100%, since an attempt doesn’t guarantee that the running back will gain yards on the play.
Rush Yards vs. Rush Yards Before Contact
Next, I want to explore the relationship between rush yards and rush yards before contact.
| rush_yds | rush_yds_before_contact | |
|---|---|---|
| rush_yds | 1.0000000 | 0.9006858 |
| rush_yds_before_contact | 0.9006858 | 1.0000000 |
Figure 19 demonstrates that there is also a heavy correlation between these two variables, at 90%. However, I am surpised the correlation is not even higher. One of the variables represents the total rush yards in a game, and the other variable represents the total rush yards in a game before the player is hit. Normally, when the player gets hit, it means the play is over. Because of this, I am surprised the correlation is not closer to 100%.
Overall Analysis
Now that we have explored rushing, passing, and recieving more deeply, I am interested in analyzing the dataset as a whole.
Frequency of Positions
First, I am interested in looking at how common each position is in the dataset.
Figure 20 highlights that Wide Recievers are the most common offensive position in the dataset, which makes sense since there are more wide recievers on the feild at a time than the other positions. Quarterback was the least common positions, which is also not surprising since teams normally only have 1 backup quarterback, and their main “franchise” quarterback remains on the field at all times. This is unlike the other positions where the players consistently switch out, so they need more backups.
I was surprised that there was more tight ends in the dataset than running backs. I see more running backs switch out than tight ends when I’m watching football, so I always assumed there would be more backup RBs than TEs. However, this assumption is untrue as shown by the pie chart.
In Figure 21, there are more tackles than centers, which makes sense since there are more tackles on the field than centers at a time. It is surprising that there are more tackles than guards since both positions have 2 on the field at a time. I would assume it’s because tackles get more easily injured since they are on the ends of the offensive line.
Frequency of Positions by Year
Figure 22 shows that of the positions remained fairly consistent throughout each year, except for the decrease in 2022. This disparity can be explained by the fact that this dataset was created in 2022, so they likely missed players that were recruited towards the end of the year.
Conclusion
Throught this exploration, I looked at rushing, receiving, and passing. I found that the average years of each has remained consistent across the last four years, although passing yards has varied the most. It was surprisinging that passing yards varied more than receiving yards, since the two should in theory have a high correlation with one another. Each of the variables also had outliers. When these outliers were explored further, it was found that they were correlated to individual players that set NFL records with those numbers. Two of the variables, receiving and rushing, centered around two different position. In both cases, one position out performed the other, althought that was for different reasons. Wide recievers had higher receiving stats than tight ends since not all tight ends receive, and running backs had higher rushing stats than full backs since running backs are used more. I had expected the different in receiving yards since I follow the tight end position, but I did not expect the difference in rushing yards.
When looking at the three categories, there were high correlations between variables in each one. Receiving yards was heavily correlated with receiving long, and rushing yards was with rushing yards before contact and rush attempts. Additionally, passing yards, pass attempts, and pass completions all had heavy correlations with each other. These correlations all made sense logically, and I expected most of them.
In the future, it would be interesting to take the players stats analysis and see how it impacted each of the games they played it. You could use this comparison to identiy trends between certain rushing, passing, or recieving stats and whether or not the team won, which could be very useful for offensive coaches.
References
Fernandez, Daniel. “NFL Offensive Stats 2019 - 2022.” Kaggle, 23 Aug. 2022, www.kaggle.com/datasets/dtrade84/nfl-offensive-stats-2019-2022.
“NFL Passing Yards Single Game Leaders.” Pro Football Reference, Sports Reference, www.pro-football-reference.com/leaders/pass_yds_single_game.htm. Accessed 6 Dec. 2023.
“NFL Receiving Yards Single Game Leaders.” Pro Football Reference, Sports Reference, www.pro-football-reference.com/leaders/rec_yds_single_game.htm. Accessed 6 Dec. 2023.
“NFL Rushing Yards Single Game Leaders.” Pro Football Reference, Sports Reference, www.pro-football-reference.com/leaders/rush_yds_single_game.htm. Accessed 6 Dec. 2023.